1 


Alan Turing and the development of Artificial 
Intelligence 


Stephen Muggleton *, 

During the centennial year of his birth Alan Turing 
(1912-1954) has been widely celebrated as having laid 
the foundations for Computer Science, Automated De¬ 
cryption, Systems Biology and the Turing Test. In this 
paper we investigate Turing’s motivations and expec¬ 
tations for the development of Machine Intelligence, as 
expressed in his 1950 article in Mind. We show that 
many of the trends and developments within AI over 
the last 50 years were foreseen in this foundational pa¬ 
per. In particular, Turing not only describes the use 
of Computational Logic but also the necessity for the 
development of Machine Learning in order to achieve 
human-level AI within a 50 year time-frame. His de¬ 
scription of the Child Machine (a machine which learns 
like an infant) dominates the closing section of the pa¬ 
per, in which he provides suggestions for how AI might 
be achieved. Turing discusses three alternative sug¬ 
gestions which can be characterised as: 1) AI by pro¬ 
gramming, 2) AI by ab initio machine learning and 3) 
AI using logic, probabilities, learning and background 
knowledge. He argues that there are inevitable limi¬ 
tations in the first two approaches and recommends 
the third as the most promising. We compare Turing’s 
three alternatives to developments within AI, and con¬ 
clude with a discussion of some of the unresolved chal¬ 
lenges he posed within the paper. 

Keywords: Alan Turing, Artificial Intelligence, Ma¬ 
chine Intelligence 

1. Introduction 

In this section we will first review relevant parts 
of the early work of Alan Turing which pre-dated 
his paper in Mind [42], 

1.1. Early work: the Entscheidungsproblem 

Turing’s initial investigations of computation 
stemmed from the programme set out by David 
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Hilbert at the 1928 International Mathematical 
Congress. Hilbert presented three open questions 
for logic and mathematics. Was mathematics 

1. complete in the sense that any mathematical 
assertion could either be proved or disproved, 

2. consistent in the sense that false statements 
could not be derived by a sequence of valid 
steps and 

3. decidable in the sense that there exists a def¬ 
inite method to decide the truth or falsity of 
every mathematical assertion. 

Within three years Kurt Godel [13] had shown 
that not even the axoims of arithmetic are both 
complete and consistent and by 1937 both Alonzo 
Church [5] and Alan Turing [43] had demonstrated 
the undecidability of particular mathematical as¬ 
sertions. 

While Godel and Church had depended on 
demonstrating their results using purely math¬ 
ematical calculi, Turing had taken the unusual 
route of considering mathematical proof as an ar¬ 
tifact of human reasoning. He thus considered a 
physical machine which emulated a human math¬ 
ematician using a pen and paper together with a 
series of instructions. Turing then generalised this 
notion to a universal machine which could emu¬ 
late all other computing machines. He used this 
construct to show that certain functions cannot be 
computed by such a universal machine, and conse¬ 
quently demonstrated the undecidability of asser¬ 
tions associated with such functions. 

At the heart of Turing’s universal machine is 
a model of human calculation. It was this choice 
which set the scene for later discussions on the 
degree to which computers might be capable of 
more sophisticated human-level reasoning. 

1.2. Bletchley Park 

The outbreak of the Second World War provided 
Turing with an opportunity and resources to de- 
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sign and test a machine which would emulate hu¬ 
man reasoning. Acting as the UK’s main wartime 
decryption centre, Bletchley Park had recruited 
many of the UK’s best mathematicians in an at¬ 
tempt to decode German military messages. By 
1940 the Bornbe machine, designed by Turing and 
Welchman [7], had gone into operation and was ef¬ 
ficiently decrypting messages using methods pre¬ 
viously employed manually by human decoders. In 
keeping with Turing’s background in Mathemati¬ 
cal Logic, the Bombe design worked according to 
a reductio ad absurdum principle which simplified 
the hypothesis space of 26 3 possible settings for the 
Enigma machine to a small number of possibilities 
based on a given set of message transcriptions. 

The hypothesis elimination principle of the 
Bombe was later refined in the design of the Colos¬ 
sus I and II machines. The Tunny report [15] (de¬ 
classified by the UK government in 2000), shows 
that one of the key technical refinements of Colos¬ 
sus was the use of Bayesian reasoning to order the 
search through the space of hypothetical settings 
for the Lorenz encryption machine. This combi¬ 
nation of logical hypothesis generation tied with 
Bayesian evaluation were later to become central 
to approaches used within Machine Learning (see 
Section 5). Indeed strong parallels exist between 
decryption tasks on the one hand, which involve 
hypothesising machine settings from a set of mes¬ 
sage transcriptions and modern Machine Learning 
tasks on the other hand, which involve hypothesis¬ 
ing a model from a set of observations. Given their 
grounding in the Bletchley Park decryption work 
it is hardly surprising that two of the authors of 
the Tunny report, Donald Michie (1923-2007) and 
Jack Good (1916-2009), went on to play founding 
roles in the post-war development of Machine In¬ 
telligence and Subjective Probabilistic reasoning 
respectively. In numerous out-of-hours meetings at 
Bletchley Park, Turing discussed the problem of 
machine intelligence with both Michie and Good. 
According to Andrew Hodges [16], Turing’s biog¬ 
rapher 

These meetings were an opportunity for Alan 
to develop the ideas for chess-playing machines 
that had begun in his 1941 discussions with 
Jack Good. They often talked about mechani¬ 
sation of thought processes, bringing in the the¬ 
ory of probability and weight of evidence, with 
which Donald Michie was by now familiar. ... 
He (Turing) was not so much concerned with 


the building of machines designed to carry out 
this or that complicated task. He was now fas¬ 
cinated with the idea of a machine that could 
learn. 


2. Turing’s 1950 paper in Mind 

2.1. Structure of the paper 

The opening sentence of Turing’s 1950 paper 
[42] declares 

I propose to consider the question, “Can ma¬ 
chines think?” 

The first six sections of the paper provide a philo¬ 
sophical framework for answering this question. 
These sections are briefly summarised below. 

1. The Imitation Game. Often referred to as 
the “Turing test”, this is a form of parlour 
game involving a human interrogator who al¬ 
ternately questions a hidden computer and a 
hidden person in an attempt to distinguish 
the identity of the respondents. The Imita¬ 
tion Game is aimed at providing an objective 
test for deciding whether machines can think. 

2. Critique of the New Problem. Turing dis¬ 
cusses the advantages of the game for the pur¬ 
poses of deciding whether machines and hu¬ 
mans could be attributed with thinking on 
an equal basis using objective human judge¬ 
ment. 

3. The Machines Concerned in the Game. Tur¬ 
ing indicates that he intends digital comput¬ 
ers to be the only kind of machine permitted 
to take part in the game. 

4. Digital Computers. The nature of the new 
digital computers, such as the Manchester 
machine, is explained and compared to Charles 
Babbage’s proposals for an Analytical En¬ 
gine. 

5. Universality of Digital Computers. Turing 
explains how digital computers can emulate 
any discrete-state machine. 

6. Contrary Views on the Main Question. Nine 
traditional philosophical objections to the 
proposition that machines can think are in¬ 
troduced and summarily dismissed by Tur¬ 
ing. 
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2.2. Learning machines - Section 1 of Turing 
paper 

The task of engineering software which ad¬ 
dresses the central question of Turing’s paper have 
dominated Artificial Intelligence research over the 
last sixty years. In the final section of the 1950 
paper Turing addresses the motivation and possi¬ 
ble approaches for such endeavours. His transition 
from the purely philosophical nature of the first 
six sections of the paper is marked as follows. 

The only really satisfactory support that can 
be given for the view expressed at the begin¬ 
ning of section 6, will be that provided by wait¬ 
ing for the end of the century and then doing 
the experiment described. But what can we say 
in the meantime? 

Turing goes on to discuss three distinct strategies 
which might be considered capable of achieving a 
thinking machine. These can be characterised as 
follows: 1) AI by programming, 2) AI by ab initio 
machine learning and 3) AI using logic, probabil¬ 
ities, learning and background knowledge. In the 
next three sections we discuss these strategies of 
Turing in relation to various phases of AI research 
as it has been conducted over the past half century. 

3. Version 1: AI by programming [1960s-1980s] 

3.1. Storage capacity argument 

Turing considers an argument concerning the 
memory requirements for programming a digital 
computer with similar capacity to a human being. 

As I have explained, the problem is mainly one 
of programming. Advances in engineering will 
have to be made too, but it seems unlikely that 
these will not be adequate for the requirements. 
Estimates of the storage capacity of the brain 
vary from 10 10 to 10 15 binary digits. I incline 
to the lower values and believe that only a very 
small fraction is used for the higher types of 
thinking. Most of it is probably used for the re¬ 
tention of visual impressions, I should be sur¬ 
prised if more than 10 9 was required for satis¬ 
factory playing of the imitation game, at any 
rate against a blind man. (Note: The capac¬ 
ity of the Encyclopaedia Britannica, 11th edi¬ 
tion, is 2 x 10 9 ). A storage capacity of 10 7 , 


would be a very practicable possibility even by 
present techniques. It is probably not necessary 
to increase the speed of operations of the ma¬ 
chines at all. Parts of modern machines which 
can be regarded as analogs of nerve cells work 
about a thousand times faster than the lat¬ 
ter. This should provide a “margin of safety” 
which could cover losses of speed arising in 
many ways. Our problem then is to find out 
how to programme these machines to play the 
game. At my present rate of working I produce 
about a thousand digits of programme a day, 
so that about sixty workers, working steadily 
through the fifty years might accomplish the 
job, if nothing went into the wastepaper bas¬ 
ket. Some more expeditious method seems de¬ 
sirable. 

In retrospect it is amazing that Turing managed to 
foresee that “Advances in engineering” would lead 
to computers with a Gigabyte of storage by the 
end of the twentieth century. It is also noteworthy 
that Turing suggests that in terms of hardware, it 
is memory capacity rather than processing speed 
which will be critical. 

However, the final sentence of the quote above 
indicates that Turing could already foresee that 
manual composition of a program which could pass 
the Turing test was not the most “expeditious” 
method, despite the fact that a dedicated group of 
around “sixty” programmers might complete the 
task within “fifty years”” if “nothing went into 
the wastepaper basket”. Turing must already have 
been accutely aware, from his work with the early 
pilot ACE computer, that plenty goes in the waste 
basket in the process of debugging computer pro¬ 
grams. 

3.2. Programming approach to AI and the 
Machine Intelligence series 

Turing’s influence on the development of AI 
from the 1960s to the 1980s is particularly evi¬ 
dent in the Machine Intelligence book series, which 
acted as a vanguard of cutting edge AI research 
during this period. The series Executive Editor, 
Donald Michie has already been mentioned as 
one of Turing’s Bletchley colleagues. Michie was 
also the founder of Europe’s first Department of 
Artificial Intelligence in the 1960s in Edinburgh, 
and later also founded the Turing Institute (an 
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AI research institute) in the 1980s in Glasgow. 
Michie specifically chose topics for the Machine 
Intelligence workshops which were closely related 
to those which he and Jack Good had discussed 
with Turing during the war. Indeed Jack Good 
was a frequent contributor to the series on Turing- 
inspired topics such as Computer Chess [14]. To 
open the Machine Intelligence 5 volume Michie se¬ 
lected “Intelligent machinery” [44], a previously 
unpublished article, in which Turing discussed the 
idea of designing intelligent robots which could 
“roam the countryside” and learn from their expe¬ 
rience. 

Turing’s Version 1 Programming approach to 
Artificial Intelligence was the dominating paradigm 
for Artificial Intelligence research up until the mid- 
1980s. Research during this period can largely be 
divided into broad areas associated with 1) Rea¬ 
soning, 2) Physical perception and 3) Physical ac¬ 
tion. 

Reasoning Simon and Newell’s General Prob¬ 
lem Solver (GPS) [30] was an early and influ¬ 
ential attempt to program a universal problem 
solver which could be applied to a variety of for¬ 
mal symbolic reasoning problems such as theo¬ 
rem proving, geometry and chess playing. It was 
clear that although GPS could solve simple prob¬ 
lems, with more complex tasks, its reasoning was 
rapidly swamped by the combinatorics of the 
search. Throughout the 1960s-1980s a variety of 
other more specific approaches were taken to the 
problems of improving the efficiency of search (eg 
[24,9]) and planning (eg [8,11,17]). Additionally a 
variety of more special purpose techniques were de¬ 
veloped for both theorem proving (eg [34,20]) and 
chess playing (eg [38,14]). 

During the same period, attempts to address 
the difficulties, foreseen by Turing, of writing ef¬ 
fective and efficient AI programs led to the rise of 
a number of high-level languages. The methodolo¬ 
gies on which these were based varied from the use 
of A-calculus (eg LISP) [21] to the development 
of stack-based languages (eg POP1) [6] as well as 
languages based on first-order predicate calculus 
(eg Prolog) [46]. The approach of heuristic pro¬ 
gramming, developed in systems such as Dendral 
[4] and MYCIN [41], used constraints in the form 
of rules to produce systems which could reason at 
the level of human experts. These expert systems 
became a key demonstrator for the achievements 
of Artificial Intelligence in the early 1980s. 


Physical perception The 1960s-1980s witnessed a 
number of early and bold attempts to write pro¬ 
grams which could recognise three-dimensional ob¬ 
jects within a digital image (eg [19,3].) However, 
these were generally limited to analysis of simple 
polygons and it was unclear how they could be 
extended to recognise real-world objects such as 
trees, cars or people. 

In the same period considerable advances were 
made in natural language generation and under¬ 
standing (eg [35,36,37]). Early systems directly ad¬ 
dressed one of the key assumptions of Turing’s imi¬ 
tation game, by supporting answering of questions 
posed in natural language. However, just as with 
the initial attempts at computer vision, these nat¬ 
ural language systems were limited by the com¬ 
plexity of grammars provided by their program¬ 
mers. 

Physical action As mentioned previously Tur¬ 
ing [44] had discussed the idea of intelligent ma¬ 
chines which could roam the countryside, learning 
for themselves. Probably the best known mobile 
robotics project from the early years was Stan¬ 
ford’s Shakey project (1966-1972) [31]. By con¬ 
trast, in the Edinburgh Freddy assembly robot 
[2,1] the robot arm and associated digital camera 
remained in a fixed position while a platform con¬ 
taining sequentially assembled parts was directed 
to move past it by the computer. 

4. Version 2: AI by ab initio machine learning 

In his 1950s paper Turing had already antici¬ 
pated the difficulties of developing AI by manually 
programming a digital computer. His suggested 
remedy was that machines must learn in the same 
way as a human child. 

Instead of trying to produce a programme to 
simulate the adult mind, why not rather try to 
produce one which simulates the child’s? If this 
were then subjected to an appropriate course 
of education one would obtain the adult brain. 
Presumably the child brain is something like 
a notebook as one buys it from the station¬ 
ers. Rather little mechanism, and lots of blank 
sheets. (Mechanism and writing are from our 
point of view almost synonymous.) Our hope 
is that there is so little mechanism in the child 
brain that something like it can be easily pro- 
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grammed. The amount of work in the educa¬ 
tion we can assume, as a first approximation, 
to be much the same as for the human child. 

f.l. The ah initio Machine Learning movement 
[1980s-1990s] 

During the 1970s the success of the expert sys¬ 
tems movement (see Section 3.2) became increas¬ 
ingly stifled by the cost of involving experts in the 
development and maintenance of large rule-based 
systems. This problem became known as “Feigen- 
baum’s bottleneck” [10]. However, early experi¬ 
ments with Meta-Dendral [4], and later Michalski’s 
Soy Bean expert system [23], showed that rules 
could be automatically learned by machines from 
observations. Moreover, Michalski demonstrated 
that not only was this a more efficient method of 
building and maintaining expert systems, but it 
could also result in rules which were more accurate 
than existing human experts. This resulted in the 
start of a new series of workshops called Machine 
Learning [22] led by Ryszard Michalski, Jaime Car- 
bonell and Tom Mitchell. The workshops, which 
later developed into the International Conference 
on Machine Learning, were originally based on the 
format of Donald Michie’s Machine Intelligence 
workshops. 

f.2. The limits of positive and negative examples 

A common feature of systems developed within 
the standard Machine Learning framework is that, 
in Turing’s words, learning is conducted ab initio 
(Turing’s phrase is from “blank sheets”) using a 
set of vectors associated with positive and negative 
classifications. Turing provides a mathematically- 
inspired warning about such an approach. 

The use of punishments and rewards can at 
best be a part of the teaching process. Roughly 
speaking, if the teacher has no other means of 
communicating to the pupil, the amount of in¬ 
formation which can reach him does not exceed 
the total number of rewards and punishments 
applied. By the time a child has learnt to re¬ 
peat “Casabianca” he would probably feel very 
sore indeed, if the text could only be discov¬ 
ered by a “Twenty Questions” technique, every 
“NO” taking the form of a blow. 


Turing’s knowledge of information theory [39] had 
led him to anticipate some of the limitations later 
uncovered in the 1980s by Valiant’s theory of the 
learnable [45]. That is, effective ab initio machine 
learning is necessarily confined to the construction 
of relatively small chunks of knowledge. However, 
Valiant also demonstrated that the expected accu¬ 
racy of the learned knowledge can be arbitrarily 
high given sufficient examples. So, unfortunately 
we have to return to Turing’s original question of 
how to programme the 10 12 bits of memory re¬ 
quired to achieve human-level intelligence. 

5. Version 3: AI using logic, probabilities, 
learning and background knowledge 

Turing’s answer to the problems which beset 
ab initio machine learning follows immediately on 
from the quote given in the previous Section. 

It is necessary therefore to have some other 
“unemotional” channels of communication. If 
these are available it is possible to teach a ma¬ 
chine by punishments and rewards to obey or¬ 
ders given in some language, e.g., a symbolic 
language. These orders are to be transmitted 
through the “unemotional” channels. The use 
of this language will diminish greatly the num¬ 
ber of punishments and rewards required. 

Turing’s claim is that by employing an “unemo¬ 
tional” symbolic language it should be possible to 
reduce the number of examples required for learn¬ 
ing. 

5.1. Logic-based learning with background 
knowledge 

The obvious question is the appropriate form 
and function of the symbolic language to be em¬ 
ployed. Again Turing’s suggestions follow immedi¬ 
ately on from the last quote. 

Opinions may vary as to the complexity which 
is suitable in the child machine. One might 
try to make it as simple as possible consistent 
with the general principles. Alternatively one 
might have a complete system of logical in¬ 
ference “built in”. In the latter case the store 
would be largely occupied with definitions and 
propositions. 
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Alan Robinson’s introduction [34] of resolution- 
based automatic theorem proving in 1965 led to 
an explosion of interest in the use of first-order 
predicate calculus as a representation for rea¬ 
soning within AI systems. In line with Turing’s 
idea of using “built-in” logical definitions, Gordon 
Plotkin’s thesis [32] used resolution theorem prov¬ 
ing as the context for investigating a form of ma¬ 
chine learning which involves hypothesising logical 
axioms from observations and background knowl¬ 
edge. Within the era of Logic Programming [18] 
in the 1980s, these early investigations by Plotkin 
were taken up again by Shapiro [40] in the context 
of using inductive inference for automatically re¬ 
vising Prolog programs. However, it was not until 
the 1990s that the school of Inductive Logic Pro¬ 
gramming [25,26,28] started to investigate this ap¬ 
proach in depth as a highly expressive Machine 
Learning paradigm. A recent survey of the field 
[29] points to the maturity of theory, implementa¬ 
tion and applications in this area. 

5.2. Uncertainty and pj'obabilistic learning 

Turing makes some interesting observations con¬ 
cerning the uncertainty of learned rules. 

Processes that are learnt do not produce a hun¬ 
dred per cent certainty of result; if they did 
they could not be unlearnt. 

Over the last decade there has been increasing 
interest in including probablities into Inductive 
Logic Programming [33,12], These probability val¬ 
ues are used to give an indication of the uncer¬ 
tainty of learned rules. Turing also makes the fol¬ 
lowing point concerning the ephemeral nature of 
learning. 

The idea of a learning machine may appear 
paradoxical to some readers. How can the rules 
of operation of the machine change? They 
should describe completely how the machine 
will react whatever its history might be, what¬ 
ever changes it might undergo. The rules are 
thus quite time-invariant. This is quite true. 
The explanation of the paradox is that the rules 
which get changed in the learning process are 
of a rather less pretentious kind, claiming only 
an ephemeral validity. 


It is in the nature of a Universal Turing machine 
that it acts as a meta-logical interpreter. It is this 
property which allows rules to be treated as data, 
allowing them to be altered and updated. A recent 
paper [27] by the author demonstrates that the 
meta-interpretive nature of the Prolog Logic Pro¬ 
gramming language can be used to efficiently sup¬ 
port the introduction of auxilliary ‘invented 1 ’ pred¬ 
icates and recursion within the context of learning 
complex grammars. 

6. The challenge of “super-criticality” 

The previous sections indicate that many of the 
issues which Turing discusses in the last section 
of the paper have since been explored in the AI 
literature. However, one of the Machine Learning 
challenges which Turing mentions is still entirely 
open. 

Another simile would be an atomic pile of less 
than critical size: an injected idea is to corre¬ 
spond to a neutron entering the pile from with¬ 
out. Each such neutron will cause a certain dis¬ 
turbance which eventually dies away. If, how¬ 
ever, the size of the pile is sufficiently increased, 
the disturbance caused by such an incoming 
neutron will very likely go on and on increas¬ 
ing until the whole pile is destroyed. Is there 
a corresponding phenomenon for minds, and is 
there one for machines? There does seem to be 
one for the human mind. The majority of them 
seem to be ” subcritical,” i.e., to correspond in 
this analogy to piles of subcritical size. An idea 
presented to such a mind will on average give 
rise to less than one idea in reply. A smallish 
proportion are supercritical. An idea presented 
to such a mind may give rise to a whole ’’the¬ 
ory” consisting of secondary, tertiary and more 
remote ideas. Animals’ minds seem to be very 
definitely subcritical. Adhering to this analogy 
we ask, ’’Can a machine be made to be super¬ 
critical?” 

Turing’s challenge to make a machine which is 
“super-critical” seems to only makes sense in the 
context of an extreme setting of the Version 3 ap¬ 
proach (see Section 5) to Artificial Intelligence. 
The situation in which a new observation “leads to 
a theory consisting of secondary, tertiary and more 
remote ideas” requires both an alert mind, but 
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also one which is abundantly stocked with relevant 
background knowledge. Providing such abundant 
background knowledge to a machine is challenging, 
though the advent of the World-Wide-Web offers 
an obvious source, as long as the available infor¬ 
mation can be accessed for purposes of inductive 
reasoning. 


7. Conclusion 


Turing closes the Mind paper with the following 
statement. 

We can only see a short distance ahead, but we 

can see plenty there that needs to be done. 

As the present article indicates, Turing’s vision 
was far from myopic. Indeed he foresaw many of 
the key issues which dominated Artificial Intelli¬ 
gence research over the last fifty years. However, 
it could still be argued that there has been no 
convincing demonstration of a computer passing 
the Turing test to date. Modern computers are 
typically not well-equipped with deep natural lan¬ 
guage facilities capable of playing the kind of par¬ 
lour game which Turing describes. On the other 
hand, when most people these days are faced with 
an arcane (or even simple) question which they 
cannot immediately solve they turn to the closest 
computer or smart phone to find an answer. The 
implicit assumption is that the collective power 
of the World Web Web provides a greater degree 
of intelligence than that provided by asking the 
same question of whichever person is closest to 
hand. Computers instantly search through volu¬ 
minous encylopedias, find objects in images, learn 
patterns of user behaviour and provide reasonable 
translations of text in foriegn languages. Many of 
the techniques used in these tasks grew out of the 
research carried out by Artificial Intelligence labo¬ 
ratories. We have Turing to thank not only for the 
concept of the Universal Turing machine, which 
gave rise to the computer industry, but also his 
visions of intelligent machines, which inspired the 
development of much of the software behind the 
digital assistants which we find around us every¬ 
where in the modern world. 
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